Sampling rare conformational transitions with a quantum computer

Structural rearrangements play a central role in the organization and function of complex biomolecular systems. In principle, Molecular Dynamics (MD) simulations enable us to investigate these thermally activated processes with an atomic level of resolution. In practice, an exponentially large fraction of computational resources must be invested to simulate thermal fluctuations in metastable states. Path sampling methods focus the computational power on sampling the rare transitions between states. One of their outstanding limitations is to efficiently generate paths that visit significantly different regions of the conformational space. To overcome this issue, we introduce a new algorithm for MD simulations that integrates machine learning and quantum computing. First, using functional integral methods, we derive a rigorous low-resolution spatially coarse-grained representation of the system’s dynamics, based on a small set of molecular configurations explored with machine learning. Then, we use a quantum annealer to sample the transition paths of this low-resolution theory. We provide a proof-of-concept application by simulating a benchmark conformational transition with all-atom resolution on the D-Wave quantum computer. By exploiting the unique features of quantum annealing, we generate uncorrelated trajectories at every iteration, thus addressing one of the challenges of path sampling. Once larger quantum machines will be available, the interplay between quantum and classical resources may emerge as a new paradigm of high-performance scientific computing. In this work, we provide a platform to implement this integrated scheme in the field of molecular simulations.


S1: Details of the the intrinsic manifold exploration
We start by performing short unbiased sampling in each meta-stable state. Here, we assume that the interesting meta-stable states are known and that representative structures are available, as it is often the case. Each local sampling returns a set of M configurations C ini = {Q 1 , . . . , Q M }, with dim(Q i ) = 3N a . We identify the boundary of C ini in the low-dimensional intrinsic manifold defined by diffusion maps 2 . For each couple of configurations Q i and Q j , we calculate the pairwise root-meansquare-deviation (RMSD) on all non-hydrogen atoms after removing global translations and rotations, and use it as a metric, d i j = min RMSD(Q i , Q j ). We then construct a transition matrix by calculating where ε is a distance threshold that defines a notion of neighbourhood in the configuration space, here defined as ε = σ (d i j ) − ∆σ (d i j ), where σ is the average of the pairwise distances, and ∆σ its standard deviation. We use here a Gaussian kernel, which is a popular choice, but other choices are possible 2 . The constant C provides a normalization such that the sum along each row of P i j is one. Solving the eigenvalue problem for P i j , we obtain eigenvalues λ k and eigenvectors ψ k , where k = 0, . . . , M − 1. A projection of the high-dimensional data C ini onto a n N low-dimensional embedding z = {z 1 , ..., z M } is established via the components of n dominant eigenvectors 2 , i.e., each Q i gets mapped into z i = {ψ 1 (i), . . . ψ n (i)}, where ψ k (i) is the i-th component of eigenvector ψ k . We identify points on the boundary of C ini , the set of which we denote C B ini , as those defining a convex hull containing the low-dimensional embedding z.
We then generate new configurations in the unexplored regions by "shooting" beyond the boundary of the known configurations space. For each point Q B i in C B ini , we first identify the set of nearest neighbours within a given distance δ = σ (d i j )−∆σ (d i j ). For each such set, we calculate the projections of all points in the principal component analysis (PCA) representation, i.e., q i = Q i V, where lower case indicates a sample projected on the principal components, upper case in the configuration space, and V is a matrix containing the PCA loadings 1 . For every point on the boundary and its neighborhood, we generate a new unexplored configuration beyond the boundary with Here, q new , q b , and q c are respectively coordinates in the principal component projection for the new data point, boundary point, and the center of mass of the neighboring set without including the boundary point. The constant c > 0, which is adjusted heuristically, controls how far away beyond the boundary we generate new configurations. Finally, we retrieve the Cartesian coordinates of the new configuration with Q new = q new V T + Q c .
The exploration proceeds iterating between two steps: generating new configurations beyond the boundary and starting short rounds of unbiased sampling from these configurations. At every step i, we generate one new configuration for every point on the boundary of the data set of all configurations sampled up to step i − 1. We then merge together all configurations sampled up to step i and identify a new boundary, which is then used in the i + 1-th iteration.

S2: Detailed derivation of a renormalized coarse-grained dynamics
We assume the system obeys a Langevin dynamics in the overdamped limit, where Q is a point in the molecular configuration space, m and γ are the atomic mass and viscosity (assumed for simplicity to be uniform and isotropic), U(Q) is the potential energy, and η(t) is a white noise of null average obeying the fluctuation-dissipation relationship where N a is the number of atoms and D = k B T /mγ is the diffusion coefficient. The probability of observing a system described by Eq. (S.3) in a given configuration at a given time obeys the Fokker-Planck equation: The solution of this equation can be conveniently expressed in path-integral form [3][4][5] , is the so-called Onsager-Machlup functional 6 . To build a coarse-grained representation of the molecular dynamics with a low spatio-temporal resolution, it is convenient to formulate the stochastic dynamics defined by the Fokker-Planck Equation (S.5) using the mathematical formalism of quantum mechanics. Substituting P(Q,t) = e where we have introduced an effective "Dirac's constan"h eff = 2k B T γ that controls the amount of thermal fluctuations in the system andĤ eff = −¯h 2 eff 2m ∇ +V eff (Q) is an effective quantum Hamiltonian, with an effective potential Similarly, equation (S.6) can be re-written as an imaginary-time quantum Feynman propagator 7,8 P .
(S. 10) We note that the pre-factor e − 1 2k B T (U(Qf )−U(Q i )) in Eq. (S.9) does not affect the relative statistical weight of transition paths sharing identical boundary conditions.
The description of stochastic dynamics provides a natural framework to devise a systematic coarse-grained scheme, e.g. by applying renormalization group methods. In particular, we are interested in developing a low spatio-temporal resolution approximation of our original microscopic stochastic dynamics, characterized by a spatial resolution σ on the order of the typical distance between nearest neighbour configurations generated by iMapD. The time resolution ∆t of this coarse-grained theory represents the shortest time scale, thus must be much smaller than the typical time it takes the system to diffuse across neighboring configurations in C .

2/11
To lower the spatial resolution of the original stochastic theoy, we define a set of so-called coarse-grained states in the Hilbert space of the effective quantum mechanical theory: In this expression, φ (Q) denotes a fast-decaying smooth smearing function centered around 0, such that |φ (Q)| |Q| σ − −−− → 0. We determine φ (Q) by requiring the orthonormality condition where δ σ (Q) is a coarse-grained representation of Dirac's delta function, smeared to the resolution scale σ . In particular, choosing a Gaussian smearing Smearing the position eigen-states to the scale σ using Eq. (S.11) is equivalent to lowering the spatial resolution by cutting off large momentum states, with p h eff /σ . In the framework of Renormalization Group theory, this procedure is commonly referred to as "regularization". In practice, we associate each configuration Q i ∈ C with a finite region of the configuration space, identify it with a coarse-grained state |Q i }.
To assign a finite probability to coarse-grained paths ( Fig.2 in the main text), we need a path integral expression for the Feynman propagator defined on the coarse-grained states, i.e.
The explicit evaluation of this propagator is quite lengthy and is detailed in the following sub-section. The final result is: (S.14) where D cg = σ 2 /(2∆t) and V cg (Q) are the diffusion coefficient and effective potential of the coarse-grained theory, respectively. The numerical scheme we used to estimate it are also discussed in section S5.

Feynman propagator in the coarse-grained theory
In this section, we derive our coarse-grained expression for the Feynman propagator of our coarse-grained theory describing the stochastic dynamics on the intrinsic manifold. In practice, this corresponds to evaluating the matrix element of the evolution operator as a path integral performed over the coarse-grained states |Q}. In our specific implementation, the configurations specifying the center of the coarse-grained states lie on the intrinsic manifold and belong to the data-set generated with iMapD, i.e. Q ∈ C . The Feynman propagator reads h effĤ e f f is the time evolution operator. Following the standard derivation of the Feynman's path integral representation of the propagator, we begin by carrying out a Trotter decomposition: where, ∆t is a time scale of the order of the time resolution of the coarse-grained theory. Let us focus on the elementary propagator {Q n+1 |Û(∆t)|Q n }. As usual, by choosing a sufficiently small discretization time, we can ignore commutations between kinetic and potential operators and factorize the evolution operator where corrections are of higher order in ∆t. Using Eq. (S.11), we obtain (S.18)

3/11
By using the definition of the smearing function φ (Q) and performing the momentum integral, this expression is approximated as where N is an irrelevant normalization constant.
We now recall that ∆t is by definition the smallest time scale of our effective theory. Then, the Gaussian factor e − m(z−z ) 2 2h eff ∆t in Eq. (S.19) remains finite only when the distance between z and z lies within the spatial resolution of the coarse-grained theory, σ , and it can be effectively regarded as a Dirac delta-function. After carrying out the integral in dz , we obtain We can cast this expression in a more familiar form by introducing a smeared effective potential V s (Q) defined as follows: We emphasize that the average in the right-hand side is dominated by the configurations with the lowest effective potential. From the definition of V eff , it follows that these are configurations near mechanical equilibrium points, where the modulus of the total force |∇U| is very small and the Hessian of the molecular potential energy is positive. Combining all terms, the elementary time propagator becomes Some comments on this result are in order. First, we note that the elementary propagator is exponentially suppressed when the distance between Q n and Q n+1 is larger than σ . On the other hand, any structure of V s (Q) below the short scale σ is smeared out due to the average involved in its definition (S.21). So, in the last term at the exponent we may use V s where we have re-expressed the effective Dirac's constant in terms of the diffusion coefficient,h eff = 2Dm. Equation (S.23) qualitatively resembles the structure of the elementary propagator in the microscopic theory, However, it is important to note that the time discretization step dt, required to obtain the propagator in the microscopic theory, is in general orders of magnitude smaller than that of the coarse-grained theory, ∆t.
The coarse-grained propagator (S.23) can be cast in a form that is completely analog to the microscopic expression (S.24) by introducing the effective diffusion coefficient of the coarse-grained theory D cg , defined in terms of the spatio-temporal resolution scales: is the effective potential of the coarse-grained theory.

4/11
The re-scaling of the diffusion coefficient reflects the fact that the characteristic time scales of the coarse-grained theory are different from those of the original microscopic theory. To appreciate the physical origin of the rescaling, it is instructive to compute the probability for the system initially at some given position, to remain within an infinitesimal volume, after an infinitesimal time interval. To remove the dependence on normalization factors, it is convenient to compute this probability relative to the same quantity in the purely diffusive limit (i.e. for free Brownian motion). In the microscopic theory (using Eq. (S.24) ) we obtain: Thus, the ratio V eff (x)/(2Dm) is identified with a rate of escape from an infinitesimal volume centered around Q.
The same calculation can be performed in the coarse-grained theory, using Eq. (S.26): In this low-resolution theory, V cg (Q)/(2mD cg ) is interpreted as the rate of escape from a region of size σ centered in Q.
Even though the direct evaluation of V s (Q) (and therefore of V cg (x)) is computationally very challenging, we can consider approximations to make it feasible in practice. For example, it could be directly estimated by computing the rate of escape from a region of size σ centered around each coarse-grained point Q i ∈ C , using the short MD simulations generated during the iMapD exploration.
In our first exploratory application, we choose to adopt a simpler phenomenological approach: we first smeared out the short-distance structure of V eff through a self-averaging over the values of the effective potential evaluated on groups of structurally close configurations generated during the iMapD exploration. Let us denote the result of the average around point Q i with V eff Q i . Then, the effective potential of the coarse-grained theory V cg (Q i ) was estimated by introducing a suitable renormalization constant, In principle, the rescaling parameter α may be fixed in order to match global time scale, such as the average lifetime of the coarse-grained states, estimated by measuring how long it takes on average for the system to cover a distance σ in configuration space. In practice, since we were not interested in predicting time-dependent observables, the constant α was determined implicitly, by relying on the Hamilton-Jacobi formulation described in section S5.

S3.1: Exploring the intrinsic manifold
We start by sampling from two metastable states of alanine dipeptide in OpenMM. One starting from C5 region at the top left corner of Fig. S1, and from α R in the vicinity of (φ = −75 ,ψ = −20). After this initial sampling, we evaluate the diffusion map (DMAP) for each set of configurations separately to obtain a low-dimensional representation of the sampled configurations. We then identify the boundary in this representation, to initialize new simulations beyond the region that has already been sampled. We measure the pairwise distance between configurations by using the RMSD calculated on all non-hydrogen atoms after having removed global translations and rotations. The exploration proceeds by initiating unbiased sampling from each new configurations and then merging all the new data to the previous. By iterating over DMAP evaluation at every step, finding new configurations, and sampling we populate the transition region between C5 and α R .
We eventually terminate the iterations when the configurations explored starting from the two initial metastable state overlap, i.e., when at least two configurations have RMSD closer than 0.3 Å (Fig. S2(a)).

S3.2: Simulation details
In all simulations, the molecule was placed in a square box with 2.85 nm base vector, solvated with the TIP3P water model using AMBER99SB forcefield 9 . We energy-minimized the initial configuration using L-BFGS algorithm implemented in OpenMM 10 , with tolerance on the square mean root of all force components at 500 kJ (nm mole) −1 . Simulations were performed at a temperature T = 300 K, using a Langevin integrator with friction coefficient γ = 91 ps −1 and timesteps of ∆t = 2 fs. The initial sampling bursts in the two metastable states were 200 ps long starting in C5 state and 20 ps for α R . All following runs were 1 ps long. We further remove those points with higher potential energy than the median (≈ 85.84 k B T ), shown as dark pink color. (c) DMAP embedding of points whose potential is below median, shown as a function of the first two DCs.

S4.1: Data Reduction and Identification of the Nodes in the Graph
Successive rounds of explorations carpeted the transition region between the basins associated with the α R and C5 states. The number of sampled configurations needed to be reduced to obtain a sparser transition network compatible with the D-Wave quantum annealer. We first removed all configurations having a potential energy higher than the median value 85.84 k B T (Fig. S2(b)). Second, we calculated the DMAP of all remaining samples and projected them on the first two diffusion coordinates (DCs), obtaining a set of points C ini = {z i } (Fig. S2(c)). We then proceeded by identifying the two configurations in C ini with minimum RMSD distance from the initial configurations lying in two basins C5 (denoted by Q A ) and α (denoted by Q B ). Starting from z A (corresponding to Q A ), we kept the nearest point z k1 satisfying D diff (z k1 , z A ) > D thresh diff , and removed from C ini all the configurations with lower D diff . Here D diff (diffusion distance) is the Euclidean distance between points in DMAP space, defined here as the plane spanned by the two diffusion coordinates DC1 and DC2. We used D thresh diff = 7.5 × 10 −4 inspecting the histogram of nearest-neighbour pairwise diffusion distances among the points in C ini , Fig S3(a). This value ensures that for any configuration in the original data set only one representative is kept, and henceforth their nearest neighbors based on D diff -considered kinetically similar-are removed. We continued by applying the same procedure between remaining configurations in C ini and z k1 , and therefore identifying z k2 . By iterating over this step until there was no remaining element in C ini , and finally appending back z B to C ini (in case it was removed in previous iterations) we obtained a reduced version of our data set, C , containing 83 points. C approximately span uniformly the DMAP embedding of the transition between the two metastable states (Fig. S3(b)-(c)).

6/11
(c) (a) (b) Figure S3. Sub-sampling explored configurations in a kinetically uniform way (a)D diff histogram for nearest neighbors. D thresh diff = 7.5 × 10 −4 contains more than 98% of nearest neighbor pairwise distances. Reduced data projected on (b) the first two DCs and (c) Ramachandran plot.

S4.2: Building the Graph
The next step required to build a graph having as nodes the sparse set of configurations C (Fig. S4). In this graph, only nodes who correspond to configurations that are kinetically and structurally close, should be connected. We used two criteria to insure this condition. Two nodes should be connected if their diffusion distance is smaller than 0.01 and their RMSD closer than 0.8 Å. Both thresholds were chosen heuristically from the histogram of pairwise diffusion distances and RMSD calculated on the set of configurations generated by iMapD (Fig. S5).
Num. of attempts Correct topology Wrong topology Success rate 117 69 48 0.59 Table S1: Summary of the transition path generation on D-Wave to calculate the histogram reported in Fig. 4

S5: Details on the Calculation of the Graph Weights and the Determination of the Rescaling Parameter α
In the previous section, we have derived a suitable definition of the graph weights w i j that explicitly depends on the time resolution ∆t of our coarse-grained representation of the dynamics. An alternative way of fixing a time resolution is to consider the Laplace transform of the elementary propagator, In this expression, the time resolution is determined by the inverse of the frequency cut-off scale s. We adopted this regularization prescription in our illustrative application to alanine dipeptide.
To evaluate the elementary propagator in Laplace space, we resorted to the so-called Dominant Reaction Pathway (DRP) formalism 4,11 , that originates from analyzing the path integral expression of the Feynman propagator K(Q f ,t|Q i , 0) in saddlepoint approximation. The We note that the exponential pre-factor can be ignored when comparing the probability density of transition paths connecting the same initial and final configuration. Therefore, in the graph representation, we can retain only the second exponent. Equation (S.35) can be used to obtain the weights of the links in our coarse-grained graph representation of the Langevin dynamics. We obtain where s 0 is a cut-off time scale, estimated from the value of the effective potential in the reactant state Q i , i.e. s 0 ∼ −V cg (Q i ) (for a detailed discussion on this point, see 11 ). We note that an advantage of the frequency regularization (relative to the conventional time regularization) is that the resulting expression for w i j depends on the Hamilton-Jacobi functional, which is defined in terms of structural distances ∆l between configurations. The convenience of this discretization arises from the fact that there is no gap in the internal distance scales of molecular systems.
We recall that, in the present first application, we estimated the coarse-grained effective potential using V cg (Q) = α V eff Q , where · Q denotes a local average performed over the configurations near Q generated during iMapD exploration. Then, As discussed in the previous section, α could be in principle determined by matching some global time scale computed from microscopic simulations. In this work, we adopted a simpler approach and eliminated the dependence on this parameter by re-normalizing all weights w i j , by dividing them by the largest link in the graph w max . This procedure ensured that the relative probabilities of the paths in the graph are comparable and finite, as they should in the coarse-grained theory.

S6: Auto-Correlation function
The function G(N) used to estimate the auto-correlation along the Markov chain of our hybrid Monte Carlo algorithm is defined as follows, .

(S.38)
In this equation, the Γ (2) (k) represent the collection of all binary link variables generated at the k-th Monte Carlo step and we are implicitly assuming periodic boundary conditions, i.e., Γ i j (N MC + 1) = Γ (2) (1) i j . Here, N MC is the number of Monte Carlo steps, ∑ denotes a summation restricted to the ε pairs of indexes i and j that label topologically adjacent sites in the graph and |E | denotes the number of edges present in the graph. Finally, N is the distance in the Monte Carlo chain. The average · is intended over many Monte Carlo trajectories. In practice, however, the computing time that was available to us on the D-Wave quantum computer was sufficient to generate only 3 Monte Carlo trajectories. Since such a limited statistics does not allow us to estimate the averages in Eq. (S.38), we chose not to perform the average and directly analyze the behavior of G(N) for each independent trajectory.
Due to the absence of autocorrelation, the cost of generating an ensemble of N independent transition paths scales like ∼ Ns a , where s is the cost of a single Monte Carlo step and a is the average acceptance ratio. In our case, s amounts to about 120 10/11 seconds of total computing (of which about 6 seconds of quantum computing) and a 0.6. Therefore, to produce an ensemble of about 100 independent transition paths for this system, our Monte Carlo scheme would require slightly more than 5 hours of total computing time, including about 15 minutes of QPU time.